Punctured Elias Codes for variable-length coding of the integers
نویسنده
چکیده
The compact representation of integers is an important problem in areas such as data compression, especially where there is a nearly monotonic decrease in the likelihood of larger integers. While many different representations have been described, it is not always clear in which circumstances a particular code is to be preferred. This report introduces a variant of the Elias γ code which is shown to be better than other codes for some distributions. 1. Compact integer representations The efficient representation of symbols of differing probabilities is one of the classical problems of information theory and coding theory, with efficient solutions known since the early 1950’s (Shannon-Fano and Huffman codes[9]). In traditional, non-adaptive, coding we assume a priori probabilities of the input symbols and construct suitable codes to represent those symbols efficiently. There is no necessary or simple relation between a symbol and its representation. Here we are concerned with a different problem, especially as the symbol alphabet (integers of arbitrary upper bound) may be so large as to preclude the formal construction of an efficient code. Given an arbitrary integer we wish to represent it as compactly as possibly, preferably by an algorithm which recognises only the magnitude and bit pattern of the integer (no table lookup or mapping needed). Equally, a simple algorithm should be able to recover an integer from an input bit stream, even if that particular integer has never been seen before. The binary representation of the integer is often visible within the representation and other information is appended to indicate the length or precision. Many variable-length representations have been described; here we concentrate on just a few, emphasising those which have a simple relation between code and value and are instantaneous or nearly so. Following Elias[3], we first introduce two preliminary representations which are relatively unimportant per se, but are used in many other codes. • α(n) is the unary representation, n 0’s followed by a 1 (or 1’s followed by a 0) • β(n), is the natural binary representation of n, from the most significant 1. 1.1 Levenstein and Elias g Codes These codes were first described by Levenstein[11], but the later description by Elias[3] is generally used in the English language literature. Elias describes a whole series of codes, with the α and β codes already described. His γ code writes the bits of the β code (the binary representation) in reverse order, with each preceded by a flag bit. All except the last flag bit are 0, with the last flag bit a 1 and implying the most-significant 1. Thus 13 is represented as 0100011, with the flag bits underlined. The γ' code is permutation of the γ code, with the flag bits (an α code) preceding the data bits (a β code). With this code, 13 is written as 0001101 (the terminator of the α code doubles as the first bit of the β code). For most of this document the term “Elias γ code” will be used interchangeably for the two variants; often it will actually mean the γ' code. An integer of N significant bits is represented in 2N+1 bits, or an integer n is represented by 2log n+1 bits 1 . (It is convenient to ignore the floor and ceiling operators in future discussions 1 All logarithms will be to base 2, without stating an explicit base. Tech Rep 137 December 5, 1996 page 1 to simplify the mathematics. Most of the discussion involves only order of magnitude considerations, or averages over many symbols so that precise values are relatively unimportant. We therefore say that an Elias code represents an integer n in 2 log n+1 bits.) n code n code 1 1 11 0001011 2 010 12 0001100 3 011 13 0001101 4 00100 14 0001110 5 00101 15 0001111 6 00110 16 000010000 7 00111 17 000010001 8 0001000 18 000010010 9 0001001 19 000010011 10 0001010 20 000010100 100 0000001100100 250 000000011111010 Table 1. Example of Elias' γ' code. The γ code can be extended to higher number bases where such granularity is appropriate. For example, numbers can be held in byte units, with each 8-bit byte containing 1 flag bit (lastbyte/more-to-come) and 7 data bits, to give a base-128 code. 1.2 Elias w and Even-Rodeh codes All of the codes described here have a length part and a value part. In the γ code the length is given in unary; a natural progression is to specify the length itself in a variable-length code. Elias does this with his δ code, using a γ code for the length, but quickly proceeds to his ω codes. Some very similar codes were described by Even and Rodeh[4] and it is convenient to treat the two in parallel. Both of the codes have the value (as a β code) preceded by a series of length indications and followed by a 0 as a terminating comma. Value Elias ω code Even-Rodeh code 0 — 000 1 0 001 2 10 0 010 3 11 0 011 4 10 100 0 100 0 7 10 111 0 111 0 8 11 1000 0 10
منابع مشابه
Additive Variable Length Codes for the Integers
This paper introduces a new family of variable length codes for the integers, initially based on the Goldbach conjecture that every even integer is the sum of two primes. For an even integer we decompose the value into its two constituent primes and encode the ordinal numbers of those primes with an Elias gamma code. The method is then elaborated to handle odd integers. The paper then develops ...
متن کاملUsing an innovative coding algorithm for data encryption∗
This paper discusses the problem of using data compression for encryption. We first propose an algorithm for breaking a prefix-coded file by enumeration. Based on the algorithm, we respectively analyze the complexity of breaking Huffman codes and Shannon-Fano-Elias codes under the assumption that the cryptanalyst knows the code construction rule and the probability mass function of the source. ...
متن کاملUniversal codes for finite sequences of integers drawn from a monotone distribution
We offer two noiseless codes for blocks of integers Xn = (X1, . . . , Xn). We provide explicit bounds on the relative redundancy that are valid for any distribution F in the class of memoryless sources with a possibly infinite alphabet whose marginal distribution is monotone. Specifically we show that the expected code length L(Xn) of our first universal code is dominated by a linear function o...
متن کاملCompressing Integers for Fast File Access
Fast access to files of integers is crucial for the efficient resolution of queries to databases. Integers are the basis of indexes used to resolve queries, for example, in large internet search systems and numeric data forms a large part of most databases. Disk access costs can be reduced by compression, if the cost of retrieving a compressed representation from disk and the CPU cost of decodi...
متن کاملCompress Integer with Fibonacci Code and Gamma Code Using Variable-Length code in JPEG2000
To overcome many drawbacks in the current JPEG standard for sill image compression, a new standard, JPEG2000, is under development by International Standard Organization. The compact representation of integers is an important problem in areas such as data compression, especially where there is a nearly monotonic decrease in the likelihood of large integers. While describing many representation,...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1996